Improvise a Jazz Solo with an LSTM Network

Welcome to your final programming assignment of this week! In this notebook, you will implement a model that uses an LSTM to generate music. You will even be able to listen to your own music at the end of the assignment.

You will learn to:

Updates

If you were working on the notebook before this update...

List of updates

Please run the following cell to load all the packages required in this assignment. This may take a few minutes.

1 - Problem statement

You would like to create a jazz music piece specially for a friend's birthday. However, you don't know any instruments or music composition. Fortunately, you know deep learning and will solve this problem using an LSTM network.

You will train a network to generate novel jazz solos in a style representative of a body of performed work.

1.1 - Dataset

You will train your algorithm on a corpus of Jazz music. Run the cell below to listen to a snippet of the audio from the training set:

We have taken care of the preprocessing of the musical data to render it in terms of musical "values."

Details about music (optional)

You can informally think of each "value" as a note, which comprises a pitch and duration. For example, if you press down a specific piano key for 0.5 seconds, then you have just played a note. In music theory, a "value" is actually more complicated than this--specifically, it also captures the information needed to play multiple notes at the same time. For example, when playing a music piece, you might press down two piano keys at the same time (playing multiple notes at the same time generates what's called a "chord"). But we don't need to worry about the details of music theory for this assignment.

Music as a sequence of values

Run the following code to load the raw music data and preprocess it into values. This might take a few minutes.

You have just loaded the following:

1.2 - Overview of our model

Here is the architecture of the model we will use. This is similar to the Dinosaurus model, except that you will implement it in Keras.

Overview of parts 2 and 3

2 - Building the model

Sequence generation uses a for-loop

Shareable weights

  1. Define the layer objects (we will use global variables for this).
  2. Call these objects when propagating the input.

3 types of layers

Exercise: Implement djmodel().

Inputs (given)

  1. Create an empty list "outputs" to save the outputs of the LSTM Cell at every time step.

Step 2: Loop through time steps (TODO)

2A. Select the 't' time-step vector from X.

Lambda layer

2B. Reshape x to be (1,n_values).

2C. Run x through one step of LSTM_cell.

2D. Dense layer

2E. Append output

Step 3: After the loop, create the model

Create the model object

Expected Output
Scroll to the bottom of the output, and you'll see the following:

Total params: 41,678
Trainable params: 41,678
Non-trainable params: 0

Compile the model for training

Initialize hidden state and cell state

Finally, let's initialize a0 and c0 for the LSTM's initial state to be zero.

Train the model

Expected Output

The model loss will start high, (100 or so), and after 100 epochs, it should be in the single digits. These won't be the exact number that you'll see, due to random initialization of weights.
For example:

Epoch 1/100
60/60 [==============================] - 3s - loss: 125.7673
...

Scroll to the bottom to check Epoch 100

...
Epoch 100/100
60/60 [==============================] - 0s - loss: 6.1861

Now that you have trained a model, let's go to the final section to implement an inference algorithm, and generate some music!

3 - Generating music

You now have a trained model which has learned the patterns of the jazz soloist. Lets now use this model to synthesize new music.

3.1 - Predicting & Sampling

At each step of sampling, you will:

Initialization

Exercise:

If you pre-define a function, you can do the same thing:

def add_one(x)
    return x + 1

# use the add_one function inside of the Lambda function
result = Lambda(add_one)(input_var)

Step 3: Inference Model:

This is how to use the Keras Model.

model = Model(inputs=[input_x, initial_hidden_state, initial_cell_state], outputs=the_outputs)

Run the cell below to define your inference model. This model is hard coded to generate 50 values.

Expected Output If you scroll to the bottom of the output, you'll see:

Total params: 41,678
Trainable params: 41,678
Non-trainable params: 0

Initialize inference model

The following code creates the zero-valued vectors you will use to initialize x and the LSTM state variables a and c.

Exercise: Implement predict_and_sample().

Step 1

Step 2

Step 3

Expected (Approximate) Output:

**np.argmax(results[12])** = 1
**np.argmax(results[17])** = 42
**list(indices[12:18])** = [array([1]), array([42]), array([54]), array([17]), array([1]), array([42])]

3.3 - Generate music

Finally, you are ready to generate music. Your RNN generates a sequence of values. The following code generates music by first calling your predict_and_sample() function. These values are then post-processed into musical chords (meaning that multiple values or notes can be played at the same time).

Most computational music algorithms use some post-processing because it is difficult to generate music that sounds good without such post-processing. The post-processing does things such as clean up the generated audio by making sure the same sound is not repeated too many times, that two successive notes are not too far from each other in pitch, and so on. One could argue that a lot of these post-processing steps are hacks; also, a lot of the music generation literature has also focused on hand-crafting post-processors, and a lot of the output quality depends on the quality of the post-processing and not just the quality of the RNN. But this post-processing does make a huge difference, so let's use it in our implementation as well.

Let's make some music!

Run the following cell to generate music and record it into your out_stream. This can take a couple of minutes.

To listen to your music, click File->Open... Then go to "output/" and download "my_music.midi". Either play it on your computer with an application that can read midi files if you have one, or use one of the free online "MIDI to mp3" conversion tools to convert this to mp3.

As a reference, here is a 30 second audio clip we generated using this algorithm.

Congratulations!

You have come to the end of the notebook.

What you should remember

Congratulations on completing this assignment and generating a jazz solo!

References

The ideas presented in this notebook came primarily from three computational music papers cited below. The implementation here also took significant inspiration and used many components from Ji-Sung Kim's GitHub repository.

We're also grateful to François Germain for valuable feedback.